Painfree LaTeX with Optical Character Recognition and Machine Learning
نویسندگان
چکیده
Recent years have seen an increasing interest in harnessing advancements machine learning (ML) and optical character recognition (OCR) to convert physical and handwritten documents into digital versions. The increasing adoption of digital documents in academia, however, has provided a new layer of complexity to automatic digitization of physical documents. Compared to typical texts written in natural language, academic papers often contain many elements like equations that are notably tougher to recognize with many current OCR techniques. Furthermore, creating digital documents from scratch tends to be difficult too: while typesetting systems like LaTeX help streamline equation formatting and document generation, the process of typesetting complex equations is a arduous and error-prone process. Ultimately, academics aspire to find a solution where they can get the simplicity of generating documents by handwriting while still obtaining the convenience of using digital documents. In this paper, we aim to explore various techniques for converting images of mathematical equations into LaTeX source code.
منابع مشابه
Using Optical Character Recognition to Process Handwritten Musical Notation
This project aims to use machine learning as a way to make the optical character recognition of handwritten modern musical notation more accurate.
متن کاملClassification and Learning Methods for Character Recognition: Advances and Remaining Problems
Pattern classification methods based on learning-from-examples have been widely applied to character recognition from the 1990s and have brought forth significant improvements of recognition accuracies. This kind of methods include statistical methods, artificial neural networks, support vector machines, multiple classifier combination, etc. In this chapter, we briefly review the learning-based...
متن کاملAutomatic Nepali Number Plate Recognition with Support Vector Machines
Automatic number plate recognition is the task of extracting vehicle registration plates and labeling it for its underlying identity number. It uses optical character recognition on images to read symbols present on the number plates. Generally, numberplate recognition system includes plate localization, segmentation, character extraction and labeling. This research paper describes machine lear...
متن کاملRecognition of Hand-Printed Characters via Induct-RDR
The goal of character recognition research is to simplify and automate the development of character recognition algorithms. We describe here an approach based on applying preprocessing to data sets of Latin characters and then applying a machine learning approach to the data sets to build a knowledge base able to classify unseen pre-processed characters. The machine learning method, Induct/RDR,...
متن کاملOptical Character Recognition through Character-set Dependent Probabilities
While optical character recognition is field where machine learning algorithms are easily applied, it might not always be cost effective to do so. While Viola and Jones explored Haar features in their face detection algorithm, selecting from the thousands of features is quite time consuming. This paper compares their feature set with a baseline of pixel learners as well as another set of Haar f...
متن کامل